Experiments in parallel-text based grammar induction

نویسنده

  • Jonas Kuhn
چکیده

This paper discusses the use of statistical word alignment over multiple parallel texts for the identification of string spans that cannot be constituents in one of the languages. This information is exploited in monolingual PCFG grammar induction for that language, within an augmented version of the inside-outside algorithm. Besides the aligned corpus, no other resources are required. We discuss an implemented system and present experimental results with an evaluation against the Penn Treebank.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bayesian Model for Learning SCFGs with Discontiguous Rules

We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex transl...

متن کامل

Synchronous Constituent Context Model for Inducing Bilingual Synchronous Structures

Traditional Statistical Machine Translation (SMT) systems heuristically extract synchronous structures from word alignments, while synchronous grammar induction provides better solutions that can discard heuristic method and directly obtain statistically sound bilingual synchronous structures. This paper proposes Synchronous Constituent Context Model (SCCM) for synchronous grammar induction. Th...

متن کامل

Towards High Speed Grammar Induction on Large Text Corpora

In this paper we describe an e cient and scalable implementation for grammar induction based on the EMILE approach ([2], [3],[4], [5], [6]). The current EMILE 4.1 implementation ([11]) is one of the rst e cient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction research on various types of corpora...

متن کامل

On Bootstrapping of Linguistic Features for Bootstrapping Grammars

We discuss a cue-based grammar induction approach based on a parallel theory of grammar. Our model is based on the hypotheses of interdependency between linguistic levels (of representation) and inductability of specific structural properties at a particular level, with consequences for the induction of structural properties at other linguistic levels. We present the results of three different ...

متن کامل

A Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment

We present two contributions to grammar driven translation. First, since both Inversion Transduction Grammar and Linear Inversion Transduction Grammars have been shown to produce better alignments then the standard word alignment tool, we investigate how the trade-off between speed and end-to-end translation quality extends to the choice of grammar formalism. Second, we prove that Linear Transd...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004